memory constraint
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- North America > United States > Virginia (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > United Kingdom > England > Bristol (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- (2 more...)
Hypothesis Selection with Memory Constraints
Hypothesis selection is a fundamental problem in learning theory and statistics. Given a dataset and a finite set of candidate distributions, the goal is to select a distribution that matches the data as well as possible. More specifically, suppose we have sample access to an unknown distribution $P$ over a domain $\mathcal{X}$ that we know is well-approximated by one of a a class of $n$ distributions (a.k.a.
Constrained deep neural network architecture search for IoT devices accounting for hardware calibration
Deep neural networks achieve outstanding results for challenging image classification tasks. However, the design of network topologies is a complex task, and the research community is conducting ongoing efforts to discover top-accuracy topologies, either manually or by employing expensive architecture searches. We propose a unique narrow-space architecture search that focuses on delivering low-cost and rapidly executing networks that respect strict memory and time requirements typical of Internet-of-Things (IoT) near-sensor computing platforms. Our approach provides solutions with classification latencies below 10~ms running on a low-cost device with 1~GB RAM and a peak performance of 5.6~GFLOPS. The narrow-space search of floating-point models improves the accuracy on CIFAR10 of an established IoT model from 70.64% to 74.87% within the same memory constraints. We further improve the accuracy to 82.07% by including 16-bit half types and obtain the highest accuracy of 83.45% by extending the search with model-optimized IEEE 754 reduced types. To the best of our knowledge, this is the first empirical demonstration of more than 3000 trained models that run with reduced precision and push the Pareto optimal front by a wide margin. Within a given memory constraint, accuracy is improved by more than 7% points for half and more than 1% points for the best individual model format.
TinyTTA: Efficient Test-time Adaptation via Early-exit Ensembles on Edge Devices
The increased adoption of Internet of Things (IoT) devices has led to the generation of large data streams with applications in healthcare, sustainability, and robotics. In some cases, deep neural networks have been deployed directly on these resource-constrained units to limit communication overhead, increase efficiency and privacy, and enable real-time applications. However, a common challenge in this setting is the continuous adaptation of models necessary to accommodate changing environments, i.e., data distribution shifts. Test-time adaptation (TTA) has emerged as one potential solution, but its validity has yet to be explored in resource-constrained hardware settings, such as those involving microcontroller units (MCUs). TTA on constrained devices generally suffers from i) memory overhead due to the full backpropagation of a large pre-trained network, ii) lack of support for normalization layers on MCUs, and iii) either memory exhaustion with large batch sizes required for updating or poor performance with small batch sizes. In this paper, we propose TinyTTA, to enable, for the first time, efficient TTA on constrained devices with limited memory. To address the limited memory constraints, we introduce a novel self-ensemble and batch-agnostic early-exit strategy for TTA, which enables continuous adaptation with small batch sizes for reduced memory usage, handles distribution shifts, and improves latency efficiency. Moreover, we develop the TinyTTA Engine, a first-of-its-kind MCU library that enables on-device TTA.
DiskChunGS: Large-Scale 3D Gaussian SLAM Through Chunk-Based Memory Management
Feldmann, Casimir, Wilder-Smith, Maximum, Patil, Vaishakh, Oechsle, Michael, Niemeyer, Michael, Tateno, Keisuke, Hutter, Marco
Abstract--Recent advances in 3D Gaussian Splatting (3DGS) have demonstrated impressive results for novel view synthesis with real-time rendering capabilities. However, integrating 3DGS with SLAM systems faces a fundamental scalability limitation: methods are constrained by GPU memory capacity, restricting reconstruction to small-scale environments. We present DiskChunGS, a scalable 3DGS SLAM system that overcomes this bottleneck through an out-of-core approach that partitions scenes into spatial chunks and maintains only active regions in GPU memory while storing inactive areas on disk. Our architecture integrates seamlessly with existing SLAM frameworks for pose estimation and loop closure, enabling globally consistent reconstruction at scale. Our method uniquely completes all 11 KITTI sequences without memory failures while achieving superior visual quality, demonstrating that algorithmic innovation can overcome the memory constraints that have limited previous 3DGS SLAM methods. ECENT advances in neural representations for 3D scene reconstruction have revolutionized novel view synthesis, with 3D Gaussian Splatting (3DGS) [1] emerging as an exceptionally efficient and high-quality approach. Unlike volume-based methods [2]-[4] that struggle with rendering speed due to expensive ray marching, 3DGS provides real-time rendering capabilities while maintaining impressive visual fidelity.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Information Technology > Hardware (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
Tree Training: Accelerating Agentic LLMs Training via Shared Prefix Reuse
Wang, Shaojie, Wang, Jinghui, Cui, Yinghan, Chen, Xuxing, Wang, Chao, Huang, Liang, Zhang, Xiaojiang, Peng, Junyi, Wan, Li, Zhang, Haotian, Chen, Bin
In agentic LLM scenarios, an agent's interaction process during a single rollout often exhibits branching behaviors. Due to memory retrieval and concurrent tool executions at certain decision points, the token trajectory of one task evolves into a tree-like structure rather than a linear sequence. However, current training pipelines decompose such tree-structured trajectories into separate linear segments, treating each branch as an independent sequence. As a result, shared prefixes across these branches are repeatedly recomputed during both forward and backward passes. To address this inefficiency, we propose Tree Training, a paradigm that computes each shared prefix only once and reuses its intermediate results across related branches during both forward and backward passes, substantially improving computation efficiency in large-scale agentic training. This is achieved via (i) Tree Packing, which efficiently reuses shared computations across trajectories, and (ii) Gradient Restoration, which ensures correct gradient propagation across reused prefixes. Experiments on multiple open-source models demonstrate up to 3.9x reduction in total training time, enabling more efficient agentic LLM SFT and RL training.
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Memory Constrained Dynamic Subnetwork Update for Transfer Learning
Quélennec, Aël, Mozharovskyi, Pavlo, Nguyen, Van-Tam, Tartaglione, Enzo
On-device neural network training faces critical memory constraints that limit the adaptation of pre-trained models to downstream tasks. We present MeDyate, a theoretically-grounded framework for memory-constrained dynamic subnetwork adaptation. Our approach introduces two key innovations: LaRa (Layer Ranking), an improved layer importance metric that enables principled layer pre-selection, and a dynamic channel sampling strategy that exploits the temporal stability of channel importance distributions during fine-tuning. MeDyate dynamically resamples channels between epochs according to importance-weighted probabilities, ensuring comprehensive parameter space exploration while respecting strict memory budgets. Extensive evaluation across a large panel of tasks and architectures demonstrates that MeDyate achieves state-of-the-art performance under extreme memory constraints, consistently outperforming existing static and dynamic approaches while maintaining high computational efficiency. Our method represents a significant step towards enabling efficient on-device learning by demonstrating effective fine-tuning with memory budgets as low as a few hundred kB of RAM.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > France (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)